21  Merging Dataframes

21.1 Introduction

Often, we wish to link different data sets to form one single data set.

For example, we might have a data set with individual player information (name, age, height), a data set with a series of observations for each player (obs 1, obs 2, obs 3), and a further data set with information about each observation session.

21.2 Create dataframes

# Load dplyr for the full join functionality
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
# Step 1: Create Data Frame 1
df1 <- data.frame(
  ID = 1:5,
  Name = c("Fred", "Bob", "Charlie", "David", "John"),
  Age = c(22, 21, 24, 19, 23)
)

# Step 2: Create Data Frame 2
# This data frame has a common field 'Name' with df1
df2 <- data.frame(
  Name = c("Fred", "Bob", "Charlie", "David", "John", 
           "Fred", "Bob", "Charlie", "David", "John", 
           "Fred", "Bob", "Charlie", "David", "John"),
  Obs = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3),
  Val = c(10, 20, 30, 40, 50, 
          15, 25, 35, 45, 55, 
          20, 30, 40, 50, 60)
)

# Step 3: Create Data Frame 3
# This data frame also has a common field 'Obs' with df2
df3 <- data.frame(
  Obs = c(1, 2, 3, 4),
  Day = c("Monday", "Wednesday", "Thursday", "Friday"),
  Weather = c("Excellent", "Good", "Average", "Good")
)

These look like:

# Display the data frames
print("Data Frame 1:")
[1] "Data Frame 1:"
print(df1)
  ID    Name Age
1  1    Fred  22
2  2     Bob  21
3  3 Charlie  24
4  4   David  19
5  5    John  23
print("Data Frame 2:")
[1] "Data Frame 2:"
print(df2)
      Name Obs Val
1     Fred   1  10
2      Bob   1  20
3  Charlie   1  30
4    David   1  40
5     John   1  50
6     Fred   2  15
7      Bob   2  25
8  Charlie   2  35
9    David   2  45
10    John   2  55
11    Fred   3  20
12     Bob   3  30
13 Charlie   3  40
14   David   3  50
15    John   3  60
print("Data Frame 3:")
[1] "Data Frame 3:"
print(df3)
  Obs       Day   Weather
1   1    Monday Excellent
2   2 Wednesday      Good
3   3  Thursday   Average
4   4    Friday      Good

22 Merge dataframes

Now, we can merge the datasets using their common fields:

# Step 4: Merge Data Frame 1 and Data Frame 2 on 'ID'
merged_df1_df2 <- merge(df1, df2, by = "Name", all = FALSE)  # inner join

# Step 5: Merge the result with Data Frame 3 on 'ID'
final_df <- merge(merged_df1_df2, df3, by = "Obs", all = FALSE)  # inner join

# Display the merged data frame
print("Final Merged Data Frame (Inner Join):")
[1] "Final Merged Data Frame (Inner Join):"
print(final_df)
   Obs    Name ID Age Val       Day   Weather
1    1     Bob  2  21  20    Monday Excellent
2    1   David  4  19  40    Monday Excellent
3    1 Charlie  3  24  30    Monday Excellent
4    1    John  5  23  50    Monday Excellent
5    1    Fred  1  22  10    Monday Excellent
6    2     Bob  2  21  25 Wednesday      Good
7    2   David  4  19  45 Wednesday      Good
8    2 Charlie  3  24  35 Wednesday      Good
9    2    John  5  23  55 Wednesday      Good
10   2    Fred  1  22  15 Wednesday      Good
11   3     Bob  2  21  30  Thursday   Average
12   3    John  5  23  60  Thursday   Average
13   3 Charlie  3  24  40  Thursday   Average
14   3   David  4  19  50  Thursday   Average
15   3    Fred  1  22  20  Thursday   Average

Alternatively, if you want a full outer join to keep all rows from all data frames:

final_df_full <- df1 %>%
  full_join(df2, by = "Name") %>%
  full_join(df3, by = "Obs")

# Display the full outer join result
print("Final Merged Data Frame (Full Outer Join):")
[1] "Final Merged Data Frame (Full Outer Join):"
print(final_df_full)
   ID    Name Age Obs Val       Day   Weather
1   1    Fred  22   1  10    Monday Excellent
2   1    Fred  22   2  15 Wednesday      Good
3   1    Fred  22   3  20  Thursday   Average
4   2     Bob  21   1  20    Monday Excellent
5   2     Bob  21   2  25 Wednesday      Good
6   2     Bob  21   3  30  Thursday   Average
7   3 Charlie  24   1  30    Monday Excellent
8   3 Charlie  24   2  35 Wednesday      Good
9   3 Charlie  24   3  40  Thursday   Average
10  4   David  19   1  40    Monday Excellent
11  4   David  19   2  45 Wednesday      Good
12  4   David  19   3  50  Thursday   Average
13  5    John  23   1  50    Monday Excellent
14  5    John  23   2  55 Wednesday      Good
15  5    John  23   3  60  Thursday   Average
16 NA    <NA>  NA   4  NA    Friday      Good